完整版计算机语言python100道pandas含答案文档格式.docx
- 文档编号:19903269
- 上传时间:2023-01-12
- 格式:DOCX
- 页数:13
- 大小:21.23KB
完整版计算机语言python100道pandas含答案文档格式.docx
《完整版计算机语言python100道pandas含答案文档格式.docx》由会员分享,可在线阅读,更多相关《完整版计算机语言python100道pandas含答案文档格式.docx(13页珍藏版)》请在冰豆网上搜索。
c'
d'
e'
f'
g'
h'
i'
j'
df=pd.DataFrame(data,index=labels)
5.DisplayasummaryofthebasicinformationaboutthisDataFrameanditsdata.
[5]:
df.info()
#...or...
df.describe()
6.Returnthefirst3rowsoftheDataFramedf
[6]:
df.iloc[:
3]
#orequivalently
df.head(3)
7.Selectjustthe'
and'
columnsfromtheDataFramedf.
[7]:
df.loc[:
['
]]
#or
df[['
8.Selectthedatainrows[3,4,8]andincolumns['
].
df.loc[df.index[[3,4,8]],['
9.Selectonlytherowswherethenumberofvisitsisgreaterthan3.
[4]:
df[df['
]>
3]
10.Selecttherowswheretheageismissing,i.e.isNaN.
].isnull()]
11.Selecttherowswheretheanimalisacatandtheageislessthan3.
df[(df['
]=='
)&
(df['
]<
3)]
12.Selecttherowstheageisbetween2and4(inclusive).
].between(2,4)]
13.Changetheageinrow'
to1.5.
[
]:
df.loc['
]=1.5
14.Calculatethesumofallvisits(thetotalnumberofvisits).
df['
].sum()
15.Calculatethemeanageforeachdifferentanimalindf.
[8]:
df.groupby('
)['
].mean()
16.Appendanewrow'
k'
todfwithyourchoiceofvaluesforeachcolumn.Thendeletethatrowtoreturnthe
originalDataFrame.
]=[5.5,'
2]
#andthendeletingthenewrow...
df=df.drop('
)
17.Countthenumberofeachtypeofanimalindf.
[9]:
].value_counts()
18.Sortdffirstbythevaluesinthe'
indecendingorder,thenbythevalueinthe'
visit'
columnin
ascendingorder.
[10]:
df.sort_values(by=['
],ascending=[False,True])
19.The'
columncontainsthevalues'
.Replacethiscolumnwithacolumnofboolean
values:
shouldbeTrueand'
shouldbeFalse.
]=df['
].map({'
True,'
False})
[14]:
].replace('
python'
print(df)
21.Foreachanimaltypeandeachnumberofvisits,findthemeanage.Inotherwords,eachrowisananimal,
eachcolumnisanumberofvisitsandthevaluesarethemeanages(hint:
useapivottable).
[15]:
df.pivot_table(index='
columns='
values='
aggfunc='
mean'
22.YouhaveaDataFramedfwithacolumn'
A'
ofintegers.Forexample:
df=pd.DataFrame({'
[1,2,2,3,4,5,5,5,6,7,7]})
Howdoyoufilteroutrowswhichcontainthesameintegerastherowimmediatelyabove?
[16]:
df.loc[df['
].shift()!
=df['
#Alternatively,wecouldusedrop_duplicates()here.Note
#thatthisremoves*all*duplicatesthough,soitwon'
t
23.GivenaDataFrameofnumericvalues,say
df=pd.DataFrame(np.random.random(size=(5,3)))#a5x3frameoffloatvalu
es
howdoyousubtracttherowmeanfromeachelementintherow?
df.sub(df.mean(axis=1),axis=0)
24.SupposeyouhaveDataFramewith10columnsofrealnumbers,forexample:
df=pd.DataFrame(np.random.random(size=(5,10)),columns=list('
abcdefghij'
))
Whichcolumnofnumbershasthesmallestsum?
((Findthatcolumn'
slabel.)
[17]:
df.sum().idxmin()
25.HowdoyoucounthowmanyuniquerowsaDataFramehas(i.e.ignoreallrowsthatareduplicates)?
len(df)-df.duplicated(keep=False).sum()
#orperhapsmoresimply...
len(df.drop_duplicates(keep=False))
26.YouhaveaDataFramethatconsistsof10columnsoffloating--pointnumbers.Supposethatexactly5
entriesineachrowareNaNvalues.ForeachrowoftheDataFrame,findthecolumnwhichcontainsthethird
NaNvalue.
(YoushouldreturnaSeriesofcolumnlabels.)
(df.isnull().cumsum(axis=1)==3).idxmax(axis=1)
27.ADataFramehasacolumnofgroups'
grps'
andandcolumnofnumbers'
vals'
.Forexample:
list('
aaabbcaabcccbbc'
),
[12,345,3,1,45,14,4,52,54,23,235,21,57,3,87]})
grp'
].nlargest(3).sum(level=0)
28.ADataFramehastwointegercolumns'
B'
.Thevaluesin'
arebetween1and100(inclusive).For
eachgroupof10consecutiveintegersin'
(i.e.(0,10],(10,20],...),calculatethesumofthe
correspondingvaluesincolumn'
.
df.groupby(pd.cut(df['
],np.arange(0,101,10)))['
29.ConsideraDataFramedfwherethereisanintegercolumn'
X'
[7,2,0,3,4,2,5,0,3,4]})
Foreachvalue,countthedifferencebacktothepreviouszero(orthestartoftheSeries,whicheveriscloser).
Thesevaluesshouldthereforebe[1,2,0,1,2,3,4,0,1,2].Makethisanewcolumn'
Y'
izero=np.r_[-1,(df['
]==0).nonzero()[0]]#indicesofzeros
idx=np.arange(len(df))
]=idx-izero[np.searchsorted(izero-1,idx)-1]
30.ConsideraDataFramecontainingrowsandcolumnsofpurelynumericaldata.Createalistoftherowcolumnindexlocationsofthe3largestvalues.
df.unstack().sort_values()[-3:
].index.tolist()
31.GivenaDataFramewithacolumnofgroupIDs,'
andacolumnofcorrespondingintegervalues,
'
replaceanynegativevaluesin'
withthegroupmean.
defreplace(group):
mask=group<
group[mask]=group[~mask].mean()
returngroup
df.groupby(['
])['
].transform(replace)
32.Implementarollingmeanovergroupswithwindowsize3,whichignoresNaNvalue.Forexampleconsider
thefollowingDataFrame:
>
df=pd.DataFrame({'
group'
aabbabbbabab'
),
value'
[1,2,3,np.nan,2,3,
np.nan,1,7,3,np.nan,8]})
df
groupvalue
0a1.0
1a2.0
2b3.0
3bNaN
4a2.0
5b3.0
6bNaN
7b1.0
8a7.0
9b3.0
10aNaN
11b8.0
ThegoalistocomputetheSeries:
01.000000
11.500000
23.000000
33.000000
41.666667
53.000000
63.000000
72.000000
83.666667
92.000000
104.500000
114.000000
g1=df.groupby(['
]#groupvalues
g2=df.fillna(0).groupby(['
]#fillna,thengroupvalues
s=g2.rolling(3,min_periods=1).sum()/g1.rolling(3,min_periods=1).count()#comp
s.reset_index(level=0,drop=True).sort_index()
33.CreateaDatetimeIndexthatcontainseachbusinessdayof2015anduseittoindexaSeriesofrandom
numbers.Let'
scallthisSeriess.
dti=pd.date_range(start='
2015-01-01'
end='
2015-12-31'
freq='
)
s=pd.Series(np.random.rand(len(dti)),index=dti)
34.FindthesumofthevaluesinsforeveryWednesday
s[s.index.weekday==2].sum()
35.Foreachcalendarmonthins,findthemeanofvalues.
s.resample('
M'
).mean()
36.Foreachgroupoffourconsecutivecalendarmonthsins,findthedateonwhichthehighestvalue
occurred.
s.groupby(pd.TimeGrouper('
4M'
)).idxmax()
37.CreateaDateTimeIndexconsistingofthethirdThursdayineachmonthfortheyears2015and2016.
pd.date_range('
2016-12-31'
WOM-3THU'
38.SomevaluesinthetheFlightNumbercolumnaremissing.Thesenumbersaremeanttoincreaseby10witheachrowso10055and10075needtobeputinplace.Fillinthesemissingnumbersandmakethecolumnan
integercolumn(insteadofafloatcolumn)
]df['
FlightNumber'
].interpolate().astype(int)
39.TheFrom_Tocolumnwouldbebetterastwoseparatecolumns!
Spliteachstringontheunderscore
delimiter_togiveanewtemporaryDataFramewiththecorrectvalues.Assignthecorrectcolumnnamesto
thistemporaryDataFrame.
temp=df.From_To.str.split('
_'
expand=True)
temp.columns=['
From'
To'
40.NoticehowthecapitalisationofthecitynamesisallmixedupinthistemporaryDataFrame.Standardise
thestringssothatonlythefirstletterisuppercase(e.g."
londON"
shouldbecome"
London"
.)
temp['
]=temp['
].str.capitalize()
41.DeletetheFrom_TocolumnfromdfandattachthetemporaryDataFramefromthepreviousquestions.
From_To'
axis=1)
df=df.join(temp)
42.IntheAirlinecolumn,youcanseesomeextrapuctuationandsymbolshaveappearedaroundtheairline
names.Pulloutjusttheairlinename.E.g.'
(BritishAirways.)'
shouldbecome'
British
Airways'
.
Airline'
].str.extract('
([a-zA-Z\s]+)'
expand=False).str.strip()
#note:
using.strip()getsridofanyleading/trailing
43.IntheRecentDelayscolumn,thevalueshavebeenenteredintotheDataFrameasalist.Wewouldlikeeach
firstvalueinitsowncolumn,eachsecondvalueinitsowncolumn,andsoon.Ifthereisn'
tanNthvalue,the
valueshouldbeNaN.
ExpandtheSeriesoflistsintoaDataFramenameddelays,renamethecolumnsdelay_1,delay_2,
etc.andreplacetheunwantedRecentDelayscolumnindfwithdelays.In
delays=df['
RecentDelays'
].apply(pd.Series)
delays.columns=['
delay_{}'
.format(n)forninrange(1,len(delays.columns)+1)]
axis=1).join(delays)
44.Giventhelistsletters=['
C'
]andnum
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 完整版 计算机语言 python100 pandas 答案